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Abstract 

Problem Statement: Reliability, which refers to the degree to which 
measurement results are free from measurement errors, as well as its 
estimation, is an important issue in psychometrics. Several methods for 
estimating reliability have been suggested by various theories in the field 
of psychometrics. One of these theories is the generalizability theory. In 
generalizability theory, two distinct reliability coefficients are estimated: 
the generalizability coefficient (G coefficient) for relative evaluation, and 
the index of dependability (Phi coefficient) for absolute decisions. Like in 
all methods of reliability estimation, G and Phi coefficients are estimated 
based on a data set obtained from a sample as a result of administering the 
instrument. Therefore, it has been a critical issue to determine what 
sample size is necessary in order to reliably estimate the population's 
characteristics. 

Purpose of Study: The purpose of this study is to determine the adequate 
sample size required to ensure that the G and Phi coefficients obtained 
from a sample can estimate the G and Phi coefficients for the population in 
an unbiased way. 

Methods: A total of 480691 students who took Form A of the SBS test for 
the 6 th grade in 2008 were considered as the population of the study. Using 
a bootstrap method, a total of 1200 students were selected from this 
population, randomly falling into 12 subgroups consisting of different 
sample sizes (n=30, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000), 
with each sample size having 100 replications. Since the test battery 
contained five subtests with distinct contents and numbers of items, and 

all items were replied to by all participants, a p " X i" multivariate G 
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theory design was used. G and Phi reliability coefficients were estimated 
both for the population and each of the 12 distinct samples of different 
sizes. The relative root mean square error (R-RMSE) index was used as the 
error index to analyze the consistency of the G and Phi coefficients with 
the G and Phi parameters estimated for the population. 

Findings and Results: It was found that the G and Phi coefficients estimated 
for a sample size of 30 tended to be less than the G and Phi parameters, 
and the R-RMSE value was greater than .01. When the sample size was 50 
or more, R-RMSE values were less than .01. Thus it can be said that G and 
Phi coefficients are robust estimators of G and Phi parameters. Moreover, 
it was concluded that where the sample size is 400 or greater, R-RMSE 
values become stable. It was seen that a sample size of 400 is a more exact 
and robust estimator of G and Phi parameters, and increasing the sample 
size over 400 does not make a significant contribution to the unbiased 
estimation of G and Phi parameters. 

Conclusions and Recommendations: A sample size of 30 does not provide an 
adequately unbiased estimation of G and Phi coefficients. It can be 
recommended that sample sizes of 50 to 300 are adequate for a robust 
estimation of G and Phi coefficients; however, a more exact and robust 
estimation requires a sample size of 400. In future research, the sample 
size for facets using different designs of G theory can be studied. 

Keywords: Generalizability theory, sample size, generalizability coefficient. 
Phi coefficient 


Due to the measurement errors present in educational and psychological 
measurements, accurate scores cannot be obtained. When measuring a variable it is 
desirable to obtain measurement scores as close to the real measure as possible. 
Therefore, reliability, which refers to the degree to which measurement results are 
free from measurement errors, as well as its estimation, play central roles in 
psychometrics. Several methods for estimating reliability have been suggested by 
various theories in the field of psychometrics. These methods of estimating reliability 
are statistics estimated based on a data set obtained from a sample as a result of 
administering an instrument. Therefore, it is critical to determine the sample size in 
order to estimate the reliability of the population. An adequate sample size must be 
used in order to accurately estimate reliability while ensuring economy in 
administering the instrument. 

There are many suggestions in the psychometric literature about adequate 
sample sizes required to estimate reliability. Kline (1986), for example, reports that 
samples in reliability analysis must be contain 200 or more data points. On the other 
hand, Nunnally and Bernstein (1994) stress that a large sample size should be used to 
minimize sample errors, thus estimating reliability confidents accurately, and 
suggest that sample sizes should be 300 or more. However, Segall (1994) states that a 
sample size of 300 is small for reliability estimation. Charter (1999, 2003) 
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recommends a sample size of 400 to estimate the population reliability precisely. On 
the other hand, Yurdugiil (2008) reported that if the first eigenvalue of a 
measurement cluster is greater than six, a sample size of 30 is adequate; if the 
eigenvalue is between 3 and 6, a sample size of 100 is adequate; and if the eigenvalue 
is less than 3, a sample size of 300 or more is adequate. No upper limit is 
recommended for the sample size in reliability literature, but the ideal sample size 
has been discussed. In addition. Felt and Ankenmann (1998,1999) stated that it is ill- 
advised to employ a sample size of less than 30, and Charter (2008) also suggested 
that levels below this threshold are unwise. One reliability estimation theory is 
Generalizability Theory, which was developed by Lee J. Cronbach et al. in 1972 based 
on the shortcomings of classical test theories (Crocker & Algina, 1986; Shavelson & 
Webb, 1991; Nunnally & Bernstein, 1994; Brennan, 2001a). 

Generalizability Theory (G Theory) enables the assessment of reliability in 
behavioural measurements, and the design, research, and conceptualization of 
reliable observations (Shavelson & Webb, 1991; Brennan, 2001a). G Theory was first 
put forward by Cronbach et al. as a reaction to the shortcomings of the still popular 
real score model of classical reliability theory. Classical reliability theory considers 
the errors inherent in the measurement results to be errors coming from a single 
source. On the contrary, G theory considers the errors coming from all potential error 
sources together, as well as their interaction effects (Breannan, 2011). The purpose of 
G theory is to generalize the observed scores of measured subjects to the population 
scores accurately by defining and interpreting the measurement results and 
distinguishing different sources of variance. G theory assumes that the reliability of 
an observation depends on the studied population (Crocker & Algina, 1986; 
Shavelson & Webb 1991; Brennan, 2001a) 

G theory takes into consideration two means of estimating reliability in education 
and psychology: relative and absolute evaluation. Therefore, in G theory two distinct 
reliability coefficients are estimated: a generalizability coefficient (G coefficient) for 
relative evaluations, and an index of dependability (Phi coefficient) for absolute 
decisions (Crocker & Algina, 1986; Shavelson & Webb, 1991; Brennan, 2001a). 

Used for relative evaluations and symbolized by Ep ", the G coefficient is defined 
as equal to the proportion of universe score variance [& " (p)] to the sum of the same 
variance and relative error variance 



( 1 ) 
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Used for absolute evaluations and symbolized by ®, the Phi coefficient is defined 
as equal to the proportion of universe score variance [C " (p)] to the sum of the same 
variance and absolute error variance P 3 (A) ]; 

<t 2 (r) 

0 =-- 

<t 2 (V) + <t 2 (A) 

( 2 ) 

(Shavelson & Webb 1991; Brennan, 2001a) 

Cronbach et al. (1972) warn that variance components used in estimating the G 
and Phi coefficients can be unstable, depending on the sample size. Smith (1978) 
stresses that using a small sample size does not provide a sound ground in 
estimating the G and Phi coefficients, and if a sample size is small the G and Phi 
coefficients will not be stable. The issue of adequate sample size in estimating the G 
and Phi coefficients needs to be studied within generalizability theory (Shumate, 
Surles, Johnson & Penny, 2007). This study investigated the adequate sample size 
that will ensure that the G and Phi coefficients obtained from the sample can estimate 
the G and Phi coefficients for the population in an unbiased way. 


Method 

The Instrument and Data Collection 

The results of the SBS test for the 6 th grade held by the Ministry of National 
Education (MoNE) in 2008 were used in the study. This test consisted of five subtests 
with 80 multiple-choice items (4 choices per item). The Turkish subtest consisted of 
19 items, and the Math, Science, and Social Studies subtests comprised 16 items each. 
The foreign language subtest contained 13 items. The answers (A, B, C, and D) given 
by 480691 students on "Form A" of the test were converted into a 1-0 matrix 
according to the answer keys of the relevant subtests, and this matrix was used in the 
study. 

Population and samples 

The study population consisted of a total of 480691 students who took Form A of 
the SBS test for the 6 th grade in 2008. Using a bootstrap method, a total of 1200 
students were selected from this population, randomly falling into 12 subgroups 
consisting of different sample sizes (n=30, 50, 100, 200, 300, 400, 500, 600, 700, 800, 
900,1000), with each sample size having 100 replications. Analyses were carried out 
on 1200 samples selected in this manner. 

Data Analysis 

The test battery used in the study consists of five subtests: In this test, there is a 
different set of items nested within each of the levels of the fixed facets, such as the 
Turkish, Math, Science, Social Studies, and Foreign Fanguage. Brennan (2001) states 
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that a model consisting of such subtests and items is called a "table of specifications." 
In this case, if all students taking the test (p) answer all of the items (£) in each of the 


subtests (^) ("x" is crosed with and nested within), the model is defined as 

p X (£:^). If, in this case, the number of items in each subtest is equal, the test 
follows a balanced design. However, when the number of items in each subtest is 
unequal it becomes an unbalanced design as a mixed model. Brennan (2001, p. 86) 
suggests that "unbalanced designs with mixed models are best treated using 
multivariate generalizability theory." Since the test used in this study consisted of 
five subtests with different numbers of items and all items are replied by all students, 

p m X i " multivariate G theory design was used (Brennan, 2001a; 2001b). In the 
design, a superscript filed circle • shows that the facet is crossed with the fixed 
multivariate variables and a superscript empty circle 0 shows that the facet is nested 
within fixed multivariate variables. In this design, the analyses were done using the 
PC version of mGENOVA 2.1. G and Phi coefficients were first calculated for the 
population. Next, G and phi coefficients were estimated for 1200 students in 12 
subgroups consisting of different sample sizes (n=30, 50, 100, 200, 300, 400, 500, 600, 
700, 800, 900,1000), with each having 100 replications. Then, consistency between the 
G and Phi coefficients estimated from each of the sample sizes consisting of 100 
samples (n=30, 50,100, 200, 300, 400, 500, 600, 700, 800, 900,1000) and the G and Phi 
parameters calculated for the population were analyzed. The relative root mean 
square error (R-RMSE) index was used as the error index for G and Phi coefficients. 


R-RMSE 



{Epl-Ep 2 ? 

Ep 2 


R-RMSE 



z 


(o^-Q ) 2 

<D 


(3) 

(4) 


In this equation Ep 2 represents the G coefficient of a population, and ^ 

2 

represents the Phi coefficient of the population. Ep { - and O ij respectively represent 

the G and Phi coefficients estimated from the / th sample for i sample size. The M in 
the equation represents the number of replications selected for each of the sample 
sizes using the simple random sampling method. In this study, 100 samples of 

M=100 were selected for each of the sample sizes (£=30, 50, 100, 200, 300, 400, 500, 
600, 700, 800, 900,1000) using the simple random sampling method. 

For sample size studies based on simulation, R-RMSE is taken into consideration 
(Yurdugiil, 2008), and an R-RMSE value closer to zero indicates robust estimation of 
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a parameter (Yurdugiil, 2009). If the estimated G and Phi values are equal to the G 
and Phi parameters, this indicates excellent consistency and the R-RMSE value is 
zero. As the R-RMSE values calculated as an error index get closer to zero, the G and 
Phi coefficients estimated from the samples can be said to be more robust estimators 
of the population G and Phi parameters. In this study it was assumed that when R- 
RMSE values are less than .01, the estimated G and Phi coefficients are robust 
estimators of the real G and Phi parameters. 


Findings and Results 

G and Phi parameters were calculated for the data set obtained from the 


population of 480691 using p * X i" multivariate G theory design. The G parameter 
value was calculated as .95774 and the Phi parameter value was .95397. Next, a total 
of 1200 students were selected from this population using a random sampling 
method creating 12 subgroups consisting of different sample sizes (n=30, 50,100, 200, 
300, 400, 500, 600, 700, 800, 900, 1000), each consisting of 100 samples. G and Phi 
coefficients were estimated for each of the samples selected. Graphics (see Figure 1) 
were produced in order to show how the G and Phi coefficients estimated from the 
samples changed according to sample sizes. 



O 0,93 - - 
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0,91 -\ -1-1-1-1-1-1-1-1-1-1-1-1 

30 50 100 200 300 400 500 600 700 800 900 1000 

Sample Size 



Figure 2. G and Phi coefficients estimated from different sample sizes 
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Figure 1 shows that G and Phi coefficients estimated from different sample sizes 
get closer to each other and form a narrowing cone as sample size increases. It was 
found that the G and Phi coefficients estimated for sample sizes of 30 tended to be 

lower than the G and Phi parameters (^p "=.95774 and ^=.95397). However, it can 
be said that when sample size was increased to 50,100, 200, or 300, the consistency of 
both estimated G and Phi coefficients increases relatively and gets closer to the 
parameter values. It is seen in Figure 1 that when the sample size is 400, 500, 600, 700, 
800, 900, or 1000, the estimated G and Phi coefficients are more stable, and when 
sample size is increased over 400 the consistency of the estimated G and Phi 
coefficients does not increase significantly. 

The relative root mean square error (R-RMSE) index was used as the error index 
to analyze the consistency of the G and Phi coefficients estimated for 12 different 
sample sizes (100 samples per size) selected from the population with the G and Phi 
parameters estimated for the population. The R-RMSE values of the G and Phi 
coefficients estimated for each sample size (n=30, 50,100, 200, 300, 400, 500, 600, 700, 
800, 900,1000) are shown in Table 1. 


Table 1 


R-RMSE Values ofG and Phi Coefficients Estimated According to Each Sample Size 




R-RMSE 



R-RMSE 

Sample 

Size 

G 

coefficient 

Phi 

coefficient 

Sample 

Size 

G 

coefficient 

Phi 

coefficient 

(n) 

(Ep 2 ) 

$ 

(n) 

(ep 2 ) 

(<*) 

30 

.01334 

.01437 

500 

.00201 

.00219 

50 

.00758 

.00842 

600 

.00188 

.00206 

100 

.00606 

.00673 

700 

.00171 

.00189 

200 

.00376 

.00422 

800 

.00170 

.00186 

300 

.00286 

.00316 

900 

.00166 

.00180 

400 

.00234 

.00259 

1000 

.00140 

.00152 


The minimum and maximum R-RMSE values of the G coefficients estimated for 
each of the sample sizes ranged between .00140 and .01334. The minimum and 
maximum R-RMSE values of the Phi coefficients estimated for each of the sample 
sizes ranged between .00152 and .01437. The R-RMSE values given in Table 1 were 
found to be greater than .01 for both G and Phi coefficients when the sample size is 
30. Thus it can be said that a sample size of 30 is too small to estimate the G and Phi 
coefficients; an adequately unbiased estimation is not possible when the sample size 
is 30. When the sample size is 50 or greater (n=50, 100, 200, 300, 400, 500, 600, 700, 
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800, 900, 1000), the R-RMSE values were found to be less than .01, which suggests 
that G and Phi coefficients estimated from these sample sizes are robust estimators of 
G and Phi parameters. 



Figure 2. Change in R-RMSE values for G and Phi coefficients estimated, by sample 
size 


Figure 2 shows how R-RMSE values of estimated G and Phi coefficients change 
according to sample size. As mentioned earlier, when the sample size is 50 or more, 
R-RMSE values drop below .01. It can be said that when the sample size is 50 or 
more, G and Phi coefficients can be estimated in an unbiased way, and these sample 
sizes are robust estimators of G and Phi parameters. On the other hand, as shown in 
Figure 2, when the sample size is 400 or more (n=400, 500, 600, 700, 800, 900,1000), R- 
RMSE values become stable and do not change significantly. This suggests that 
increasing the sample size over 400 does not significantly improve the unbiased 
estimation of G and Phi parameters. 
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Conclusions and Recommendations 

In this study the data set contains 480691 students, who took a test containing 
subtests with dichotomous (1-0) scaling, was taken as the population of the study.. 
Using a bootstrap method, a total of 1200 students were selected from this 
population, randomly falling into 12 subgroups consisting of different sample sizes 
(n=30, 50,100, 200, 300,400, 500, 600, 700, 800, 900,1000), with each sample size being 
replicated 100 times. Since the test battery contained five subtests with distinct 
contents and numbers of items, and all items were replied to by all participants, 

p m X i" multivariate G theory design was used. Based on this design, the G and Phi 
coefficients estimated from different sample sizes were compared to the G and Phi 
parameters estimated for the population. As a result, it was concluded that when the 
sample size is 30, estimated G and Phi coefficients are not robust estimators of G and 
Phi parameters, and the R-RMSE value is greater than.01. This result supports that of 
Felt and Ankenmann (1998, 1999), who suggested that it is ill-advised to estimate 
reliability if the sample size is less than 30, as well as the conclusion reached by 
Charter (2008), who suggested that it is not wise to allow sample sizes below 30. It 
also verifies the warning by Cronbach et al. (1972) that variance components used in 
estimating the G and Phi coefficients can be unstable depending on the sample size, 
as well as the suggestion by Smith (1978) that when a small sample size is used G 
and Phi coefficients will not be stable. It is seen in this study that the R-RMSE values 
calculated for the G and Phi coefficients estimated for sample sizes of 50, 100, 200, 
and 300 are less than .01; i.e., they are robust estimators of G and Phi parameters. 
This finding is consistent with the findings of previous research, including that of 
Kline (1986), who recommended a sample size of 200 for reliability studies; Yurdugiil 
(2008), who recommended sample size of 300 or more when the first eigenvalue is 
less than 3; and Nunnally and Bernstein (1994), who recommended a sample size of 
300 or more. However, the findings of this study do not support SegalTs (1994) 
suggestion that a sample size of 300 is small for reliability estimation. As a matter of 
fact, this study found that a sample size of 300 is enough to make an adequately 
unbiased reliability estimation (G and Phi). On the other hand, it was observed that 
when the sample size is 400 or more (n=400, 500, 600, 700, 800, 900, 1000), the 
estimated G and Phi coefficients move considerably closer to the G and Phi 
parameters, and become stable in sample sizes over 400. It was also found that R- 
RMSE values calculated for sample sizes of 400 and over (n=400, 500, 600, 700, 800, 
900,1000) were quite small, around .002, which suggests that these sample sizes offer 
more exact and robust estimation of G and Phi coefficients. At the same time, when 
the sample size is 400 or more (n=400, 500, 600, 700, 800, 900,1000), estimated G and 
Phi coefficients become more stable. This shows that a sample size of 400 is a more 
exact and robust estimator of G and Phi parameters, supporting Charter (1999), who 
recommends a sample size of 400 to estimate the population reliability precisely. 
Moreover, it was seen that increasing the sample size over 400 does not make a 
significant contribution to the unbiased estimation of G and Phi parameters. 

As a result, it was found that when the sample size is as small as 30, the G and 
Phi coefficients cannot be estimated in a stable way. On the other hand, it was 
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concluded that when the sample size is 50,100, 200, or 300, G and Phi coefficients can 
be estimated in an adequately unbiased way. Given a sample size of 400, the 
estimations given by the G and Phi coefficients are more exact and robust. 
Nevertheless, it can be said that increasing the sample size over 400 does not make a 
significant contribution to the unbiased estimation of G and Phi coefficients. A 
sample size of 50 to 300 can be thought adequate for the robust estimation of G and 
Phi coefficients; however, a more exact and robust estimation requires a sample size 
of 400. 

In this study, p " X i " multivariate G theory design was used to estimate the G 
and Phi coefficients for a sample of people. By its nature, G theory estimates single G 
and Phi coefficients by evaluating different error sources together. Therefore, the 
sample size for different facets, including different items, time, scorers, etc., can be 
studied for different designs of G theory in future researches. 
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Genellenebilirlik Kurammda G ve Phi Katsayilannm Kestirilmesi iqin 

Orneklem Buyiikliigu 

Atif: 

Atilgan, H. (2013). Sample size for estimation of g and phi coefficients in 
generalizability theory. Egitim Arastirmalari-Eurasian Journal of Educational 
Research, 51, 215-228. 


(Ozet) 

Problem Durumu 

Egitimde ve psikolojide olgme sonuglarma kari§an olgme hatalari nedeniyle yapilan 
olgme ile gergek puana ula§ilamaz. Yapilan olgmeler ile olgiilen ozelligin gergek 
puanma olabildigince yakm olgme sonuglari elde edilmek istenir. Bu nedenle; olgme 
sonuglarmm olgme hatalarmdan ne derece anmk oldugu anlamma gelen giivenirlik 
kavranu ve giivenirligin tahmin edilmesi psikometri alamnda onemli bir yer 
tutmaktadir. Oyle ki psikometri alamnda geli§tirilen kuramlar ile pek gok giivenirlik 
tahmin metodu onerilmi§tir. Giivenirlik tahmin metodu oneren kuramlardan biri de 
Genellenebilirlik Kuramidir. Genellenebilirlik kuramiyla bagil degerlendirmeler igin 
Genellenebilirlik (G) katsayisi ve mutlak degerlendirmeler igin giivenirlik (Phi) 
katsayisi olmak iizere iki farkli giivenirlik katsayisi hesaplamr. Tiim giivenirlik 
kestirme metotlarmda oldugu gibi Genellenebilirlik kurammda da G ve Phi 
katsayilari olgme aracimn bir birey orneklemine uygulanmasi ile elde edilecek 
orneklem puan dagilimmdan hesaplanan bir istatistiktir. Bu nedenle popiilasyon 
giivenirliginin tahmin edilmesi igin orneklem biiyiikliigiiniin ne olmasi gerektigi 
onemli bir soru olagelmi§tir. Genel olarak giivenirlik kestirme gali§malarmda 
orneklem biiyiikliigiiniin ne olmasi gerektigi konusunda psikometri literatiirde farkli 
oneriler bulunmaktadir. 
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Ara§tirmamn Amaci 

Genellenebilirlik kurammda G ve Phi katsayilarmm hesaplanmasmda kullamlan 
varyans bile§enlerinin orneklem biiyiikliigiine bagli olarak degi§iklik gosterebilir. G 
ve Phi katsayilarmm kestirilmesi igin orneklem biiyiikliigiiniin yeterli olmasi 
durumunda G ve Phi katsayilan dogru olarak kestirilemez. Bu nedenle G ve Phi 
katsayilarmm kestirilmesi igin uygun orneklem biiyiikliigiiniin ne olmasi gerektigi 
genellenebilirlik kurammda gali§ilmasi gereken bir alandir. Bu gali§mada, 
orneklemden elde edilen G ve Phi katsayilarmm evren G ve Phi katsayilarmi yansiz 
olarak kestirebilmesi i^in orneklem buyukliigiinun ne olmasi gerektigi ara§tirilmi§tir. 

Ara§tirmanin Yontemi 

2008 yilmda yapilan 6. Smif Seviye Belirleme Smavi (SBS) testi "A" formunu alan 
480691 ki§i evren olarak kabul edilmi§tir. Evren olarak kabul edilen bu veri setinden 
bootstrap metoduyla 12 farkli orneklem biiyiikliigiinde (n=30, 50, 100, 200, 300, 400, 
500, 600, 700, 800, 900, 1000) ve her bir orneklem biiyiikliigii iqin 100 tane olmak 
iizere toplam 1200 orneklem basit segkisiz olarak gekilmi§tir. Verilerin elde edildigi 
testte, madde sayilari e§it olmayan farkli kapsamda be§ alt test bulundugundan ve 

turn maddeleri turn bireyler yamtladigmdan p " X i " gok degi§kenli G kurami 
deseni kullamlmi§tir. Evren igin ve bu evrenden 12 farkli orneklem buyukliigiinde 
gekilen orneklemler igin G ve Phi katsayilan hesaplanmi§tir. G ve Phi katsayilarmm 
evren igin hesaplanan G ve Phi parametreleri ile tutarliliklan incelemek igin hata 
indeksi olarak goreli hata kareler ortalamasi karekokii (R-RMSE) kullamlmi§tir. Hata 
indeksi olarak elde edilen R-RMSE degerleri sifira yakla§tikga orneklemlerden 
kestirilen G ve Phi katsayilarmm G ve Phi parametrelerinin saglam kestiricisi oldugu 
soylenebilir. Bu gali§mada R-RMSE degerlerinin 0,01 7 den kiigiik olmasi durumunda 
kestirilen G ve Phi katsayilarmm G ve Phi parametrelerinin saglam kestiricisi oldugu 
kabul edilmi§tir. 

Ara§tirmamn Bulgulan 

Orneklem biiyiikliigii 30 igin kestirilen G ve Phi katsayilarmm G ve Phi 
parametrelerinden kiigiik <^ikma egiliminde oldugu ve R-RMSE degerinin 0,01'den 
biiyiik giktigi g6riilmii§tiir. Bununla birlikte orneklem biiyiikliikleri 50, 100, 200 ve 
300 olarak arttigmda, hem kestirilen G hem de kestirilen Phi katsayilarmm goreli 
olarak tutarliliklarimn arttigi ve parametre degerlerine giderek yakla§tigi 
soylenebilir. Orneklem buyukliigii 50 ve iistiinde oldugunda R-RMSE degerleri 
0,01 7 den kiigiik bulundugundan G ve Phi katsayilarmm G ve Phi parametrelerinin 
saglam kestiricisi oldugu soylenebilir. Bununla birlikte, orneklem buyukliigii 400, 
500, 600, 700, 800, 900 ve 1000 oldugunda kestirilen G ve Phi katsayilarmm daha 
kararli davrandiklan, fakat orneklem biiyiikliigiiniin 400 7 den sonra artirilmasi 
durumunda kestirilen G ve Phi katsayilarmm tutarliligimn goreli olarak fazlaca 
degi§tirmedigi sonucuna ula§ilmi§tir. Orneklem biiyiikliigii 400 oldugunda G ve Phi 
parametrelerinin daha kesin ve daha saglam kestirildigi, orneklem biiyiikliigiiniin 
400 7 den sonra artinlmanm G ve Phi parametrelerinin yansiz kestirilmesinde onemli 
bir katki saglamadigmi g6riilmii§tiir. 
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Aragtirmamn Sonuglan ve Oneriler 

G ve Phi katsayilarimn kestirilmesi iqin orneklem biiyiikliigiiniin 30 gibi kiigiik bir 
orneklem olmasi durumunda G ve Phi katsayilarimn istikrarli olarak kestirilemedigi 
goriilmii§tiir. Diger yandan orneklem biiyiikliigiiniin 50, 100, 200 ve 300 olmasi 
durumunda G ve Phi katsayilarimn yeterince yansiz olarak kestirilebilecegi, ancak 
400 orneklem biiyiikliigiinde ise G ve Phi katsayilarimn daha kesin ve daha saglam 
oldugu sonucuna varilmi§tir. Diger yandan orneklem biiyiikliigiiniin 400'den sonra 
artirilmasimn G ve Phi katsayilarimn yansiz olarak kestirilmesine katki saglamadigi 
soylenebilir. G ve Phi katsayilarimn saglam kestirilmesi iqin orneklem 
buyiiklugunun 50 ile 300 arasmda olmasi, ancak daha kesin ve daha saglam 
kestirme igin orneklem buyukliigiinun 400 olmasi onerilebilir. 

Bu gali§mada G ve Phi katsayilarimn kestirilmesinde ki§i orneklemi iizerinde, 

p " X i " multivariate G kurami deseni ile gali§ilmi§tir. G kurami ozelligi geregi farkli 
hata kaynaklarmi birlikte degerlendirerek tek bir G ve Phi katsayilarmi kestiren bir 
kuramdir. Bu nedenle; madde, zaman, puanlayici vb. farkli hata kaynaklarimn yer 
aldigi G kuramimn farkli desenlerinde bu hata kaynaklan iqin orneklem 
biiyiikliikleri gali§ilabilir. 

Anahtar Sozcukler: Genellenebilirlik Kurami, Orneklem Biiyiikliigii, Genellenebilirlik 
Katsayisi, Phi Katsayisi 









